Genome sequence of Phaeobacter caeruleus type strain (DSM 24564T), a surface-associated member of the marine Roseobacter clade

In 2009 Phaeobacter caeruleus was described as a novel species affiliated with the marine Roseobacter clade, which, in turn, belongs to the class Alphaproteobacteria. The genus Phaeobacter is well known for members that produce various secondary metabolites. Here we report of putative quorum sensing systems, based on the finding of six N-acyl-homoserine lactone synthetases, and show that the blue color of P. caeruleus is probably due to the production of the secondary metabolite indigoidine. Therefore, P. caeruleus might have inhibitory effects on other bacteria. In this study the genome of the type strain DSM 24564T was sequenced, annotated and characterized. The 5,344,419 bp long genome with its seven plasmids contains 5,227 protein-coding genes (3,904 with a predicted function) and 108 RNA genes.


Introduction
Phaeobacter caeruleus 13 T (= DSM 24564 = LMG 24369 = CCUG 55859) was isolated at the ISMAR-CNR Marine Station, Genoa, Italy, during an analysis of the microbial diversity of a marine electroactive biofilm from a tank of about 100 L seawater [1]. The biofilm was grown on a cathodically polarized stainless-steel cathode [2]. In addition to P. caeruleus the genus consists of four other species, P. arcticus, P. daeponensis, P. gallaeciensis and P. inhibens and belongs to the Roseobacter clade, one of the most intensively studied groups of marine bacteria in recent years [3]. The clade belongs to the family Rhodobacteraceae within the class Alphaproteobacteria. P. caeruleus is named after the colony color of the isolates (cae.ru'le.us; L. masc. adj. caeruleus = dark-blue colored) [1]. Since the first publication, no further research on P. caeruleus was published. Therefore, we present for the first time a description and analysis of the high-quality draft genome sequence and annotation, including insights on genes coding for putative secondary metabolites like the blue pigment indigoidine or the quorum sensing mediating Nacyl-homoserine lactones. Furthermore, we summarize features of the organism, including novel aspects of its phenotype.

Classification and features
16S rRNA gene analysis Figure 1 shows the phylogenetic neighborhood of P. caeruleus in a tree based on 16S rRNA gene seequences. The sequences of the four 16S rRNA gene copies in the genome do not differ from each other, and do not differ from the previously published 16S rRNA gene sequence (AM943630), which contains two ambiguous base calls. Standards in Genomic Sciences A representative genomic 16S rRNA gene sequence of P. caeruleus 13 T was compared using NCBI BLAST [25,26] under default settings (e.g., considering only the high-scoring segment pairs (HSPs) from the best 250 hits) with the most recent release of the Greengenes database [27] and the relative frequencies of taxa and keywords (reduced to their stem [28]) were determined, weighted by BLAST scores [ Table 1]. The most frequently occurring genera were Phaeobacter (38.5%), Ruegeria (18.6%), Roseobacter (15.0%), Silicibacter (11.9%) and Leisingera (5.5%) (74 hits in total). Regarding the single hit to sequences from members of the species, the average identity within HSPs was 100.0%, whereas the average coverage by HSPs was 96.9%. Regarding the nine hits to sequences from other members of the genus, the average identity within HSPs was 97.6%, whereas the average coverage by HSPs was 99.5%. Among all other species, the one yielding the highest score was Phaeobacter gallaeciensis (AY881240), which corresponded to an identity of 98.3% and an HSP coverage of 99.3%. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification.) The highest-scoring environmental sequence was EF573869 (Greengenes short name 'site S25 near Coco's Island marine clone S25 213'), which showed an identity of 98.8% and an HSP coverage of 99.9%. The most frequently occurring keywords within the labels of all environmental samples which yielded hits were 'coral' (6.8%), 'caribbean' (5.8%), 'faveolata' (5.5%), 'chang' (5.4%) and 'diseaseinduc, montastraea, plagu, white' (5.2%) (169 hits in total). Environmental samples which yielded hits of a higher score than the highest scoring species were not found, indicating that the species is rarely found in environmental samples. Phylogenetic tree highlighting the position of P. caeruleus relative to the type strains of the other species within the genus Phaeobacter and the neighboring genera Leisingera and Oceanicola [4][5][6][7][8][9][10][11][12][13][14][15][16][17]. The tree was inferred from 1,387 aligned characters [18,19] of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion [20]. Oceanicola spp. were included in the dataset for use as outgroup taxa. The branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the branches are support values from 1,000 ML bootstrap replicates [21] (left) and from 1,000 maximum-parsimony bootstrap replicates [22] (right) if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [23] are labeled with one asterisk, those also listed as 'Complete and Published' with two asterisks [24]. New genome sequences are reported in this issue [9] Morphology and physiology P. caeruleus 13 T cells are Gram-negative rods with a cell size of 0.9-1.8 µm (Figure 2). Bundles of polar flagella and inclusion bodies were observed by transmission electron microscopy (not visible in Figure 2). On marine agar the cells grow in round colonies with a surface of dark and bright blue circles, which becomes darker with incubation time [1]. The utilization of carbon compounds by P. caeruleus DSM 24564 T grown at 20°C was also determined for this study using Generation-III microplates in an OmniLog phenotyping device (BIOLOG Inc., Hayward, CA, USA). The microplates were inoculated at 28°C with a cell suspension at a cell density of 95-96% turbidity and dye IF-A. Further additives included vitamines, micronutrient and sea-salt solutions. The exported measurement data were further analyzed with the opm package for R [39,40], using its functionality for statistically estimating parameters from the respiration curves such as the maximum height, and automatically translating these values into negative, ambiguous, and positive reactions. The strain was studied in two independent biological replicates, and reactions with a different behavior between the two repetitions were regarded as ambiguous. At 28°C, the strain reacted poorly, with positive reactions only for 1% NaCl, 4% NaCl, lithium chloride, propionic acid and sodium bromate. This might be due to the optimum reported growth temperature of 20°C, whereas the phenotypic measurements were examined at 28°C. The result is in accordance with our observation that after incubation for 24 h in marine broth 2216 medium (MB; BD Biosciences, Franklin Lakes, NJ) and shaken at 100 rpm, P. caeruleus DSM 24564 T shows visible growth at 20°C but not at 28°C. Note, however, that [1] reported at least some growth for temperatures up to 45°C.

Chemotaxonomy
Major fatty acids of P. caeruleus 13 T are C 18:1ω7c , C 16:0 , an unknown fatty acid with an equivalent chain-length value of 11.7999, C 10:0 3-OH , C 16:0 2-OH , C 12:0 3-OH , 11-methyl C 18:1ω7c and C 18:0 . The remaining fatty acids were present only in minor fractions and less than 1% of the total [1].  Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). Evidence codes are from the Gene Ontology project [38].

Genome project history
This organism was selected for sequencing on the basis of the DOE Joint Genome Institute Community Sequencing Program 2010, CSP 441: "Whole genome type strain sequences of the genera Phaeobacter and Leisingera -a monophyletic group of physiologically highly diverse organisms". The genome project is deposited in the Genomes On Line Database [41] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI) using state of the art technology [42]. A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
A culture of P. caeruleus DSM 24564 T was grown in DSMZ medium 514 [43] at 20°C. Genomic DNA was isolated using Jetflex Genomic DNA Purification Kit (GENOMED 600100) following the standard protocol provided by the manufacturer, but modified by the use of additional 10 µl proteinase K and 40 min incubation time. DNA is available through the DNA Bank Network [44].

Genome sequencing and assembly
The draft genome sequence generated using Illumina sequencing technology. For this genome, we constructed and sequenced an Illumina shortinsert paired-end library with an average insert size of 270 bp which generated 5,484,184 reads and an Illumina long-insert paired-end library with an average insert size of 7,670 +/-2,475 bp which generated 4,839,808 reads totaling 1,549 Mb of Illumina data (Feng Chen, unpublished). All general aspects of library construction and sequencing performed can be found at the JGI web site [45]. The initial draft assembly contained 54 contigs in 17 scaffolds. The initial draft data was assembled with Allpaths [46] and the consensus was computationally shredded into 10 kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet [47], and the consensus sequences were computationally shredded into 1.5 kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second Velvet assembly was shredded into 1.5 kbp overlapping fake reads.

Genome annotation
Genes were identified using Prodigal [49] as part of the JGI genome annotation pipeline [50], followed by a round of manual curation using the JGI GenePrimp pipeline [51]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform.

Genome properties
The genome statistics are provided in Table 3 and Figure 3. The assembly of the the genome se-quence consists of three large scaffolds for the chromosome (3,520,924 bp, 564,457 bp and 447,629 bp in length, respectively) and six plasmids with sizes of 21,535 bp to 270,810 bp and a total G+C content of 63.3%. Of the 5,335 genes predicted, 5,227 were protein-coding genes, and 108 RNAs; 81 pseudo genes were also identified. The majority of the protein-coding genes (73.2%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Insights into the genome
Genome sequencing of Phaeobacter caeruleus DSM 24564 T resulted in nine scaffolds (contigs) with sizes between 22 kb and 3.5 MB ( Table 5). The largest scaffold represents the chromosome as indicated by the presence of the typical replication initiation protein DnaA (Caer_2072) and the same affiliation can be assumed for scaffold 3 based on the absence of plasmid replication genes. The presence of more than 30 tRNA genes and CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats), which provide acquired resistance against viruses [52], on scaffold 2 is indicative for the chromosome.
However, scaffold 2 does also contain a complete RepABC operon with genes for plasmid replication initiation (RepC-11; unpublished replication type) and partitioning (RepAB) as well as a perfect palindrome 5'-TTTACCG' that probably represents a functional cis-acting anchor for plasmid partitioning [53]. This peculiar distribution may either indicate the integration of a RepABC-11 type plasmid into the chromosome via recombination or an "outsourcing" of essential chromosomal genes to a plasmid that has recently been documented for the photosynthesis genes cluster of the Roseobacter litoralis [54].
The presence of plasmid replication modules on the remaining six fragments with sizes between 22 and 271 kb indicates that they all represent extrachromosomal elements, but their circularity has not been experimentally validated ( Table 5). Three of the putative plasmids also contain RepABC-type operons representing the compatibility groups C-2, C-8 and C-12 [53]. The three remaining plasmids pCaer_C109, pCaer_D95 and pCaer_F22 represent DnaA-like I, RepB-I and RepA-I type plasmids, respectively [55,56]. The smallest plasmid pCaer_F22 contains the RepA-I type replicase, but a partitioning module is lacking. This distribution may correspond to a higher plasmid copy number within the cell thus assuring the replicon maintenance in the daughter cells after cell division.
The locus tags of all replicases, plasmid stability modules and the large virB4 and virD4 genes of type IV secretion systems are presented in Table  6. The plasmids pCaer_B246 and pCaer_C109 contain postsegregational killing systems (PSKs) consisting of a typical operon with two small genes encoding a stable toxin and an unstable antitoxin [57]. The largest plasmid pCaer_A271 contains a complete type IV secretion system including the virB operon for the formation of a transmembrane channel. The relaxase VirD2, which is required for the strand-specific DNA nicking at the origin of transfer (oriT), and the coupling protein VirD4 support the presence of functional conjugation system [58,59]. The DnaA-like I replicon pCaer_C109 contains a large type VI secretion system (T6SS) with a size of about 30 kb. The role of this export system that has been first described in the context of bacterial pathogenesis, but recent findings indicate a more general physiological role in defense against eukaryotic cells and other bacteria in the environment [60].   Several strains affiliated with the Roseobacter clade show a high potential to produce secondary metabolites [61]. Pigmentation of cells is often related with secondary metabolite production [62]. We assume that the characteristic blue color of P. caeruleus is attributed to the production of the blue pigment indigoidine. In the closely related and blue-colored Phaeobacter sp. strain Y4I indigoidine is produced via a non-ribosomal peptide synthase (NRPS)-based biosynthetic pathway encoded by the gene cluster igiBCDFE [63]. In strain Y4I indigoidine production is correlated with pleiotrophic effects, such as motility, resistance to hydrogen peroxide, surface colonization and inhibition of Vibrio fischeri. A cluster analysis revealed that the P. caeruleus plasmid pCaer_B246 contains a homologous igiBCDFE gene cluster (Caer_4407 -Caer_4412). Thus it seems likely that P. caeruleus can also produce the antimicrobial secondary metabolite indigoidine via its NRPS cluster. Therefore, indigoidine could be the pigment responsible for the blue color and P. caeruleus could have inhibitory effects on other bacteria. Mutants in either of the two LuxIR systems in Phaeobacter sp. strain Y4I are lacking the indigoidine production, therefore, quorum sensing seems to play a role in its biosynthesis [63]. A correlation between quorum sensing and pigmentation and antimicrobial effects is already known for members of the Roseobacter clade. The LuxIR-type Recently siderophore production was shown for P. inhibens DSM 17395 [65]. Distinct siderophore transport systems such as an ABC-type enterobactin transport system, two ABC-type cobalamin/Fe3+-siderophores transport systems, two ABC-type Fe3+-siderophore transport systems, two ABC-type Fe3+-hydroxamate transport systems, a TonB-dependent siderophore receptor and a siderophore-interacting protein are encoded in the genome of P. caeruleus (Caer_4537, Caer_1186, Caer_4536, Caer_1187, Caer_4538, Caer_1188, Caer_4539, Caer_4530, Caer_4535). But only one gene, encoding a phosphopantetheinyl transferase component of a siderophore synthetase, is associated with siderophore biosynthesis (Caer_3105). As it was isolated from a biofilm and a siderophoretransport associated genes were present, we presume that P. caeruleus DSM 24564 T is utilizing siderophores, which are produced by other ambient bacteria [66]. The phylogenetic tree of the 16S rRNA gene analysis (Figure 1) with intermingled Phaeobacter and Leisingera species indicates that the classification of P. caeruleus DSM 24564 T might need to be reconsidered. Hence, we conducted a preliminary phylogenomic analysis using GGDC [67][68][69] and the draft genomes of the type strains of the other Leisingera and Phaeobacter species. The results shown in Table 7 indicate that the DNA-DNA hybridization (DDH) similarities calculated in silico for P. caeruleus DSM 24564 T compared to other Phaeobacter species are, in general, not higher than those to Leisingera species. Although, the highest value by far was obtained for P. daeponensis, it was immediately followed by L. aquimarina and L. methylohalidivorans, which is in accordance with Figure 1.  [67] for details. The distance formulas are explained in [68]. The numbers in parentheses are IMG object IDs (GenBank accession number in the case of P. gallaeciensis) identifying the underlying genome sequences.